Goto

Collaborating Authors

 stress detection


Efficient-Husformer: Efficient Multimodal Transformer Hyperparameter Optimization for Stress and Cognitive Loads

Orazaly, Merey, Temirkhanova, Fariza, Park, Jurn-Gyu

arXiv.org Artificial Intelligence

Transformer-based models have gained considerable attention in the field of physiological signal analysis. They leverage long-range dependencies and complex patterns in temporal signals, allowing them to achieve performance superior to traditional RNN and CNN models. However, they require high computational intensity and memory demands. In this work, we present Efficient-Husformer, a novel Transformer-based architecture developed with hyperparameter optimization (HPO) for multi-class stress detection across two multimodal physiological datasets (WESAD and CogLoad). The main contributions of this work are: (1) the design of a structured search space, targeting effective hyperparameter optimization; (2) a comprehensive ablation study evaluating the impact of architectural decisions; (3) consistent performance improvements over the original Husformer, with the best configuration achieving an accuracy of 88.41 and 92.61 (improvements of 13.83% and 6.98%) on WESAD and CogLoad datasets, respectively. The best-performing configuration is achieved with the (L + dm) or (L + FFN) modality combinations, using a single layer, 3 attention heads, a model dimension of 18/30, and FFN dimension of 120/30, resulting in a compact model with only about 30k parameters.


Fairness-Aware Few-Shot Learning for Audio-Visual Stress Detection

Shelke, Anushka Sanjay, Sneh, Aditya, Adyasha, Arya, Lone, Haroon R.

arXiv.org Artificial Intelligence

Fairness in AI-driven stress detection is critical for equitable mental healthcare, yet existing models frequently exhibit gender bias, particularly in data-scarce scenarios. To address this, we propose FairM2S, a fairness-aware meta-learning framework for stress detection leveraging audio-visual data. FairM2S integrates Equalized Odds constraints during both meta-training and adaptation phases, employing adversarial gradient masking and fairness-constrained meta-updates to effectively mitigate bias. Evaluated against five state-of-the-art baselines, FairM2S achieves 78.1% accuracy while reducing the Equal Opportunity to 0.06, demonstrating substantial fairness gains. We also release SAVSD, a smartphone-captured dataset with gender annotations, designed to support fairness research in low-resource, real-world contexts. Together, these contributions position FairM2S as a state-of-the-art approach for equitable and scalable few-shot stress detection in mental health AI. We release our dataset and FairM2S publicly with this paper.


PULSE: Privileged Knowledge Transfer from Electrodermal Activity to Low-Cost Sensors for Stress Monitoring

Zhao, Zihan, Mortazavi, Masood, Yan, Ning

arXiv.org Artificial Intelligence

Electrodermal activity (EDA), the primary signal for stress detection, requires costly hardware often unavailable in real-world wearables. In this paper, we propose PULSE, a framework that utilizes EDA exclusively during self-supervised pretraining, while enabling inference without EDA but with more readily available modalities such as ECG, BVP, ACC, and TEMP. Our approach separates encoder outputs into shared and private embeddings. We align "shared" embeddings across modalities and fuse them into a modality-invariant representation. The "private" embeddings carry modality-specific information to support the reconstruction objective. Pretraining is followed by knowledge transfer where a frozen EDA teacher transfers sympathetic-arousal representations into student encoders. On WESAD, our method achieves strong stress-detection performance, showing that representations of privileged EDA can be transferred to low-cost sensors to improve accuracy while reducing hardware cost.


Leveraging Shared Prototypes for a Multimodal Pulse Motion Foundation Model

Mao, Wanting, Xu, Maxwell A, Haresamudram, Harish, Saha, Mithun, Kumar, Santosh, Rehg, James Matthew

arXiv.org Artificial Intelligence

Modeling multi-modal time-series data is critical for capturing system-level dynamics, particularly in biosignals where modalities such as ECG, PPG, EDA, and accelerometry provide complementary perspectives on interconnected physiological processes. While recent self-supervised learning (SSL) advances have improved unimodal representation learning, existing multi-modal approaches often rely on CLIP-style contrastive objectives that overfit to easily aligned features and misclassify valid cross-modal relationships as negatives, resulting in fragmented and non-generalizable embeddings. To overcome these limitations, we propose ProtoMM, a novel SSL framework that introduces a shared prototype dictionary to anchor heterogeneous modalities in a common embedding space. By clustering representations around shared prototypes rather than explicit negative sampling, our method captures complementary information across modalities and provides a coherent "common language" for physiological signals. In this work, we focus on developing a Pulse Motion foundation model with ProtoMM and demonstrate that our approach outperforms contrastive-only and prior multimodal SSL methods, achieving state-of-the-art performance while offering improved interpretability of learned features. Digital biomarkers (for stress, physical activity, sleep, etc.) obtained from wearable sensors, such as smart watches and smartphones, provide unprecedented opportunities to give individuals novel insights into their states of health and wellness throughout their daily life, along with new tools for managing their health-related behaviors (Rehg et al., 2017). In order to realize this potential, however, it is critical to develop effective models for multi-modal time series biosignal data, so that complementary sensing modalities can be leveraged to overcome the ambiguities and noise that are inherent in wearable signals collected in the field environment. Recently, there has been substantial progress in developing unimodal Foundation Models (FMs) which are pre-trained using large datasets on modalities such as accelerometry (Xu et al.; Y uan et al., 2024), ECG (Abbaspourazad et al., 2023; McKeen et al., 2024), and PPG (Saha et al., 2025; Pillai et al., 2024). These models have demonstrated effective generalization to downstream tasks and have established new benchmarks for performance.


Dynamic Stress Detection: A Study of Temporal Progression Modelling of Stress in Speech

Lall, Vishakha, Liu, Yisi

arXiv.org Artificial Intelligence

Abstract--Detecting psychological stress from speech is critical in high-pressure settings. While prior work has leveraged acoustic features for stress detection, most treat stress as a static label. In this work, we model stress as a temporally evolving phenomenon influenced by historical emotional state. We propose a dynamic labelling strategy that derives fine-grained stress annotations from emotional labels and introduce cross-attention-based sequential models--a Unidirectional LSTM and a Transformer Encoder--to capture temporal stress progression. Our approach achieves notable accuracy gains on MuSE (+5%) and StressID (+18%) over existing baselines, and generalises well to a custom real-world dataset. These results highlight the value of modelling stress as a dynamic construct in speech.


StressTest: Can YOUR Speech LM Handle the Stress?

Yosha, Iddo, Maimon, Gallil, Adi, Yossi

arXiv.org Artificial Intelligence

Sentence stress refers to emphasis on words within a spoken utterance to highlight or contrast an idea. It is often used to imply an underlying intention not explicitly stated. Recent speech-aware language models (SLMs) have enabled direct audio processing, allowing models to access the full richness of speech to perform audio reasoning tasks such as spoken question answering. Despite the crucial role of sentence stress in shaping meaning and intent, it remains largely overlooked in evaluation and development of SLMs. We address this gap by introducing StressTest, a benchmark designed to evaluate models' ability to distinguish between meanings of speech based on the stress pattern. We evaluate leading SLMs, and find that despite their overall capabilities, they perform poorly on such tasks. Hence, we propose a novel data generation pipeline, and create Stress-17k, a training set that simulates change of meaning implied by stress variation. Results suggest, that our finetuned model, StresSLM, generalizes well to real recordings and notably outperforms existing SLMs on sentence stress reasoning and detection. Models, code, data, samples - pages.cs.huji.ac.il/adiyoss-lab/stresstest.


Multimodal signal fusion for stress detection using deep neural networks: a novel approach for converting 1D signals to unified 2D images

Hasanpoor, Yasin, Tarvirdizadeh, Bahram, Alipour, Khalil, Ghamari, Mohammad

arXiv.org Artificial Intelligence

This study introduces a novel method that transforms multimodal physiological signals -- photoplethysmography (PPG), galvanic skin response (GSR), and acceleration (ACC) -- into 2D image matrices to enhance stress detection using convolutional neural networks (CNNs). Unlike traditional approaches that process these signals separately or rely on fixed encodings, our technique fuses them into structured image representations that enable CNNs to capture temporal and cross - signal dependencies more effectively. This image - based transformation not only improves interpretability but also serves as a rob ust form of data augmentation. To further enhance generalization and model robustness, we systematically reorganize the fused signals into multiple formats, combining them in a multi - stage training pipeline. This approach significantly boost s classification performance, with test accuracy improving from 92.57% (using individual signal orderings) to 95.86% when using the combined strategy. While demonstrated here in the context of stress detection, the proposed method is broadly applicable to any domain invo lving multimodal physiological signals, paving the way for more accurate, personalized, and real time health monitoring through wearable technologies.


Protecting Student Mental Health with a Context-Aware Machine Learning Framework for Stress Monitoring

Ovi, Md Sultanul Islam, Hossain, Jamal, Rahi, Md Raihan Alam, Akter, Fatema

arXiv.org Artificial Intelligence

Student mental health is an increasing concern in academic institutions, where stress can severely impact well-being and academic performance. Traditional assessment methods rely on subjective surveys and periodic evaluations, offering limited value for timely intervention. This paper introduces a context-aware machine learning framework for classifying student stress using two complementary survey-based datasets covering psychological, academic, environmental, and social factors. The framework follows a six-stage pipeline involving preprocessing, feature selection (SelectKBest, RFECV), dimensionality reduction (PCA), and training with six base classifiers: SVM, Random Forest, Gradient Boosting, XGBoost, AdaBoost, and Bagging. To enhance performance, we implement ensemble strategies, including hard voting, soft voting, weighted voting, and stacking. Our best models achieve 93.09% accuracy with weighted hard voting on the Student Stress Factors dataset and 99.53% with stacking on the Stress and Well-being dataset, surpassing previous benchmarks. These results highlight the potential of context-integrated, data-driven systems for early stress detection and underscore their applicability in real-world academic settings to support student well-being.


Sugar-Beet Stress Detection using Satellite Image Time Series

Sadbhave, Bhumika Laxman, Vaeth, Philipp, Dejon, Denise, Schorcht, Gunther, Gregorová, Magda

arXiv.org Artificial Intelligence

Satellite Image Time Series (SITS) data has proven effective for agricultural tasks due to its rich spectral and temporal nature. In this study, we tackle the task of stress detection in sugar-beet fields using a fully unsupervised approach. We propose a 3D convolutional au-toencoder model to extract meaningful features from Sentinel-2 image sequences, combined with acquisition-date-specific temporal encodings to better capture the growth dynamics of sugar-beets. The learned representations are used in a downstream clustering task to separate stressed from healthy fields. The resulting stress detection system can be directly applied to data from different years, offering a practical and accessible tool for stress detection in sugar-beets.


Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers

Viswanathan, Janaki, Blatt, Alexander, Hagemann, Konrad, Klakow, Dietrich

arXiv.org Artificial Intelligence

Air traffic control (ATC) demands multi-tasking under time pressure with high consequences of an error. This can induce stress. Detecting stress is a key point in maintaining the high safety standards of ATC. However, processing ATC voice data entails privacy restrictions, e.g. the General Data Protection Regulation (GDPR) law. Anonymizing the ATC voice data is one way to comply with these restrictions. In this paper, different architectures for stress detection for anonymized ATCO speech are evaluated. Our best networks reach a stress detection accuracy of 93.6% on an anonymized version of the Speech Under Simulated and Actual Stress (SUSAS) dataset and an accuracy of 80.1% on our anonymized ATC simulation dataset. This shows that privacy does not have to be an impediment in building well-performing deep-learning-based models.